The taraXÜ corpus of human-annotated machine translations

نویسندگان

  • Eleftherios Avramidis
  • Aljoscha Burchardt
  • Sabine Hunsicker
  • Maja Popovic
  • Cindy Tscherwinka
  • David Vilar
  • Hans Uszkoreit
چکیده

Human translators are the key to evaluating machine translation (MT) quality and also to addressing the so far unanswered question when and how to use MT in professional translation workflows. This paper describes the corpus developed as a result of a detailed large scale human evaluation consisting of three tightly connected tasks: ranking, error classification and post-editing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SubCo: A Learner Translation Corpus of Human and Machine Subtitles

In this paper, we present a freely available corpus of human and automatic translations of subtitles. The corpus comprises the original English subtitles (SRC), both human (HT) and machine translations (MT) into German, as well as post-editions (PE) of the MT output. HT and MT are annotated with errors. Moreover, human evaluation is included in HT, MT, and PE. Such a corpus is a valuable resour...

متن کامل

Compiling and Using a Shareable Parallel Corpus for Machine Translation Evaluation

TECMATE is a dynamic TEchnical Corpus for MAchine Translation Evaluation currently being compiled and used at the University of Leeds. A purpose-built corpus for machine translation (MT) evaluation differs in terms of size and content from corpora used for other kinds of linguistic analysis. For example, our research in automated MT evaluation requires source texts with human and machine transl...

متن کامل

A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation

In recent years, machine translation (MT) research has focused on investigating how hybrid machine translation as well as system combination approaches can be designed so that the resulting hybrid translations show an improvement over the individual “component” translations. As a first step towards achieving this objective we have developed a parallel corpus with source text and the correspondi...

متن کامل

Towards Optimal Choice Selection for Improved Hybrid Machine Translation

In recent years, machine translation (MT) research focused on investigating how hybridMT as well as MT combination systems can be designed so that the resulting translations give an improvement over the individual translations. As a first step towards achieving this objective we have developed a parallel corpus with source data and the output of a number of MT systems, annotated with metadata i...

متن کامل

Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation

We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic. Our overarching goal is to use the annotated corpus to develop automatic machine translation post-editing systems for Arabic that can be used to help accelerate the human revision process of translated texts. The creation of any manually annotated ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014